Skip to content

Genetics Implementation#2734

Open
allisterakun wants to merge 35 commits intodevfrom
genetics_implementation
Open

Genetics Implementation#2734
allisterakun wants to merge 35 commits intodevfrom
genetics_implementation

Conversation

@allisterakun
Copy link
Collaborator

@allisterakun allisterakun commented Feb 5, 2026

Implements the Genetics submodule into the Animal module, replacing the simple net_merit attribute with a full Genetics class that tracks true breeding values (TBV), estimated breeding values (EBV), phenotypes, and environmental effects for fat and protein traits.

Context

Issue(s) closed by this pull request: closes #
Original PR: #2608. See design doc, PR description, and discussion there.

What

  • Integrated the Genetics class into the Animal class, replacing the net_merit float attribute with a full genetics model tracking TBV, EBV, phenotypes, permanent/temporary environmental effects, and a ranking index for fat and protein traits.
  • Added a GeneticHistory data type and update_genetic_history method on Animal to record genetic state at each simulation day.
  • Updated AnimalConfig to load average_phenotype and top_listing_semen datasets during initialization.
  • Updated Animal.__init__ to accept a RufasTime object instead of a simulation_day integer, and to initialize Genetics differently for newborn calves (with optional dam TBV values) vs. existing animals.
  • Updated MilkProductionStatistics to include genetic attributes (TBV, EBV, phenotype, environmental effects, ranking index).
  • Updated ReproductionInputs to pass dam_tbv_fat and dam_tbv_protein instead of net_merit.
  • Removed net_merit from all animal typed dicts (CalfValuesTypedDict, HeiferI/II/IIIValuesTypedDict, CowValuesTypedDict).
  • Removed animal_net_merit from REQUIRED_FILE_BLOBS in input_manager.py.
  • Moved and reorganized genetics-related input data files (e.g., NetMerit_HO.csv, TopListingSemen_HO.csv, mean_phenotype.csv) into input/data/animal/animal_genetics/.
  • Updated metadata JSON files (freestall, open lot, no animal, and e2e testing) to reference the new genetics data file paths.
  • Updated default.json properties file with genetics-related configuration.
  • Updated simulation_engine.py, herd_factory.py, herd_manager.py, and animal_module_reporter.py to accommodate the new genetics integration.
  • Updated unit tests across multiple test files to reflect the new genetics attributes, RufasTime parameter, and removed net_merit references.
  • Updated util.py with supporting changes.

Why

The previous implementation used a single net_merit float value to represent animal genetics, which was an oversimplification. The new Genetics submodule provides a biologically meaningful representation of animal genetics by modeling true breeding values, estimated breeding values, phenotypes, and environmental effects for milk fat and protein traits. This enables more accurate simulation of genetic progress, selection decisions, and milk production variability within the herd.

How

The Genetics class computes TBV, EBV, phenotypes, and environmental effects based on birth year, animal type, parity, and optional dam genetic values. For newborn calves with known dam TBV values, genetics are initialized using parental inheritance. For all other animals (existing herd or calves without dam data), genetics are initialized based on population-level data from average_phenotype and top_listing_semen datasets loaded into AnimalConfig. The Animal constructor now requires a RufasTime object to determine birth year/month for genetic initialization. Genetic state is tracked over time via GeneticHistory records appended each simulation day. The net_merit field has been fully replaced by the ranking_index computed within the Genetics class.

Test plan

  • Updated existing unit tests in test_animal.py, test_animal_config.py, test_herd_factory.py, test_herd_manager_daily_routines.py, test_herd_manager_herd_statistics.py, test_animal_module_reporter.py, test_milk_production.py, test_reproduction.py, and test_simulation_engine.py to accommodate the new Genetics integration and RufasTime parameter.
  • Added unit tests in test_animal_genetics.py for the Genetics class.
  • Updated end-to-end test metadata files to reference the new genetics input data files.

Input Changes

  • Added: input/data/animal/animal_genetics/NetMerit_HO.csv, input/data/animal/animal_genetics/TopListingSemen_HO.csv
  • Deleted: input/data/animal_genetics/TopListingSemen_HO.csv, input/data/animal_genetics/mean_phenotype.csv (moved to new location under input/data/animal/animal_genetics/)
  • Modified: All metadata JSON files updated to reference new genetics data file paths (freestall_e2e_metadata.json, open_lot_e2e_metadata.json, no_animal_e2e_metadata.json, example_freestall_dairy_metadata.json, example_open_lot_metadata.json, example_no_animal_metadata.json)
  • Modified: input/metadata/properties/default.json updated with genetics-related properties.

Output Changes

  • Animal.genetics: New Genetics object replacing the net_merit float attribute.
  • Animal.genetic_history: New list of GeneticHistory records tracking genetic state over time.
  • Animal.milk_statistics: Now includes TBV_fat, TBV_protein, E_permanent_fat, E_permanent_protein, E_temporary_fat, E_temporary_protein, phenotype_fat, phenotype_protein, EBV_fat, EBV_protein, ranking_index.
  • Animal.net_merit: Removed from Animal and all animal typed dicts.

Filter

Genetics History

"AnimalModuleReporter._report_all_animals_genetic_history.*"

Daily Milk

"AnimalModuleReporter.report_milk.*"

Daily Average Genetics (Herd, Calves, HeiferIs, HeiferIIs, HeiferIIIs, Cows)

"AnimalModuleReporter.report_average_genetics.*"

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1677
Mypy errors on dev branch: 1677
No difference in error counts

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1740
Mypy errors on dev branch: 1682
58 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.
🚨 Flake8 linting errors were found. Please fix the linting issues.

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1739
Mypy errors on dev branch: 1681
58 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.

@YijingGong
Copy link
Collaborator

Original PR: #2608
See design doc, PR description, and discussion there.

@allisterakun allisterakun marked this pull request as ready for review March 3, 2026 17:16
@jadamchick
Copy link
Contributor

I just tried to run the branch as-is, and got the following error (pasted below). Looks like input manager is still looking for animal_net_merit in the metadata?

Starting task: 1/1
[23-Mar-2026_Mon_15-31-27.033331][ERROR][freestall] Metadata blobs error. Missing required file blobs: ['animal_net_merit']. Please add all missing file blobs to metadata.
[23-Mar-2026_Mon_15-31-27.045494][ERROR][freestall] Failed to finish task: 1/1 with output prefix: freestall. Failed to recover from error: Missing required file blobs: ['animal_net_merit']; traceback: Traceback (most recent call last):
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\task_manager.py", line 590, in task
    is_data_valid = TaskManager.handle_input_data_audit(args, input_manager, output_manager, True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\task_manager.py", line 753, in handle_input_data_audit
    is_data_valid = input_manager.start_data_processing(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\input_manager.py", line 128, in start_data_processing
    self._validate_required_file_blobs(set(self.__metadata[ADDRESS_TO_INPUTS].keys()))
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\input_manager.py", line 182, in _validate_required_file_blobs
    raise ValueError(f"Missing required file blobs: {list(missing_blobs)}")
ValueError: Missing required file blobs: ['animal_net_merit']

@allisterakun
Copy link
Collaborator Author

I just tried to run the branch as-is, and got the following error (pasted below). Looks like input manager is still looking for animal_net_merit in the metadata?

Starting task: 1/1
[23-Mar-2026_Mon_15-31-27.033331][ERROR][freestall] Metadata blobs error. Missing required file blobs: ['animal_net_merit']. Please add all missing file blobs to metadata.
[23-Mar-2026_Mon_15-31-27.045494][ERROR][freestall] Failed to finish task: 1/1 with output prefix: freestall. Failed to recover from error: Missing required file blobs: ['animal_net_merit']; traceback: Traceback (most recent call last):
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\task_manager.py", line 590, in task
    is_data_valid = TaskManager.handle_input_data_audit(args, input_manager, output_manager, True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\task_manager.py", line 753, in handle_input_data_audit
    is_data_valid = input_manager.start_data_processing(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\input_manager.py", line 128, in start_data_processing
    self._validate_required_file_blobs(set(self.__metadata[ADDRESS_TO_INPUTS].keys()))
  File "C:\Users\jms349\GitHub\RuFaS\RUFAS\input_manager.py", line 182, in _validate_required_file_blobs
    raise ValueError(f"Missing required file blobs: {list(missing_blobs)}")
ValueError: Missing required file blobs: ['animal_net_merit']

Should be fixed now!

@github-actions
Copy link
Contributor

Current Coverage: %

Mypy errors on genetics_implementation branch: 1269
Mypy errors on dev branch: 1214
55 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.
🚨 Some tests have failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update file path to match the updated location ?

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1267
Mypy errors on dev branch: 1212
55 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.

@allisterakun allisterakun requested a review from jadamchick March 25, 2026 14:40
@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1239
Mypy errors on dev branch: 1212
27 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.
🚨 Flake8 linting errors were found. Please fix the linting issues.

@github-actions
Copy link
Contributor

Current Coverage: 99%

Mypy errors on genetics_implementation branch: 1239
Mypy errors on dev branch: 1212
27 more errors on genetics_implementation branch

@github-actions
Copy link
Contributor

🚨 Unauthorized changes detected in protected files. Please remove these changes if they are not intended.

@jadamchick
Copy link
Contributor

thanks for the quick updates Allister!
I'm still just running this with the example_freestall to get a sense for things. how do you recommend viewing / making sense of the genetic history output? I used the provided filter as a csv, but when I open in Excel it gives a message that the file is too big, and the information seems jumbled and impossible to interpret (no row headings or consistency in the formatting, etc.)
"AnimalModuleReporter._report_all_animals_genetic_history.*"

@KFosterReed
Copy link
Contributor

Hi @YijingGong! Good news is that I FINALLY am coming up for air and this is my top priority for the rest of the week :)

More good news: I was able to run both example farms and collected the daily average genetic values to start with. My next step is to look at those values and come up with a plan for interpretation and come up with some edge cases to test.

One thing I'd like to point out is that this method really increases the simulation time. Here are the times I got for the two example herds on this branch and dev:

example_freestall

  • dev: 54 seconds
  • genetics: 140 seconds

example_open_lot

  • dev: 436 seconds
  • genetics: 2819 seconds

Reducing simulation time is not our primary objective but we do need to keep it in mind aso that we don't make the model unusable for people outside of research domains. One option would be to develop a method to "turn the genetics representation off". @allisterakun and @YijingGong from your intimate knowledge of the methods does this seem like a feasible idea? If so, it wouldn't need to happen in this PR but we should make an issue and start working on it so it is in place before our next release.

More to come soon!

@KFosterReed
Copy link
Contributor

Hey Team! I dug into this a bit more today and did find a way to break it by trying to run a simulation that starts in 2001:

image

You can replicate this error by changing the simulation start date to "2001:1" in the config input json.

@YijingGong
Copy link
Collaborator

Hi @YijingGong! Good news is that I FINALLY am coming up for air and this is my top priority for the rest of the week :)

More good news: I was able to run both example farms and collected the daily average genetic values to start with. My next step is to look at those values and come up with a plan for interpretation and come up with some edge cases to test.

One thing I'd like to point out is that this method really increases the simulation time. Here are the times I got for the two example herds on this branch and dev:

example_freestall

  • dev: 54 seconds
  • genetics: 140 seconds

example_open_lot

  • dev: 436 seconds
  • genetics: 2819 seconds

Reducing simulation time is not our primary objective but we do need to keep it in mind aso that we don't make the model unusable for people outside of research domains. One option would be to develop a method to "turn the genetics representation off". @allisterakun and @YijingGong from your intimate knowledge of the methods does this seem like a feasible idea? If so, it wouldn't need to happen in this PR but we should make an issue and start working on it so it is in place before our next release.

More to come soon!

Thanks for doing this testing, Kristan — really useful numbers. The open_lot slowdown especially (6x+) is hard to ignore. I agree a toggle to disable the genetics module is the right direction — and I think it's feasible, though @allisterakun should confirm. I will open an issue to track it so it's on the roadmap before the next release.

@YijingGong
Copy link
Collaborator

thanks for the quick updates Allister! I'm still just running this with the example_freestall to get a sense for things. how do you recommend viewing / making sense of the genetic history output? I used the provided filter as a csv, but when I open in Excel it gives a message that the file is too big, and the information seems jumbled and impossible to interpret (no row headings or consistency in the formatting, etc.) "AnimalModuleReporter._report_all_animals_genetic_history.*"

Hi Julie! The genetic history output was hard for me to parse too — I ended up using the VSCode debugger to step through the values day by day, which made it much easier to reason about. Here's how:

  1. Activate your virtual environment in VSCode (top search bar → select your interpreter)
  2. Generate a launch.json for the debugger (AI can generate it well):
Screenshot 2025-03-05 at 1 43 40 PM 3. Set a breakpoint inside the genetics update logic, then step through to watch how TBV, EBV, and phenotype values evolve each day for individual animals 4. Click the Run and Debug button to start: Screenshot 2025-03-05 at 1 57 38 PM

@YijingGong
Copy link
Collaborator

n

Thanks for catching this, Kristan! I was able to replicate it. The crash is in _calculate_newborn_calf_tbv_values() — it does a direct dictionary lookup on the TopListingSemen data with no bounds checking, so any calf born before 2004-09 (when the data starts) hits a KeyError. There's currently no validation or fallback for this anywhere in the genetics code.
@KFosterReed before @allisterakun fix it — should we extrapolate the semen genetic trend backward, or clamp to the earliest available value and add a warning (like lactation_curve.py does for its own year bounds)? Happy to defer to your judgment on what's more scientifically appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants